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Abstract — A fundamental problem in cognitive radio systems 
is that the cognitive radio is ignorant of the primary channel 
state and the interference it inflicts on the primary license 
holder. In this paper we assume that the primary transmitter 
sends packets across an erasure channel and the primary 
receiver employs ACK/NAK feedback (ARQ) to retransmit 
erased packets. The cognitive radio can eavesdrop on the 
primarys ARQs. Assuming the primary channel states follow 
a Markov chain, this feedback gives the cognitive radio an 
indication of the primary link quality. Based on the ACK/NACK 
received, we devise optimal transmission strategies for the 
cognitive radio so as to maximize a weighted sum of primary 
and secondary throughput. The actual weight used during 
network operation is determined by the degree of protection 
afforded to the primary link. We study a two-state model 
where we characterize a scheme that spans the boundary of 
the primary-secondary rate region. Moreover, we study a three- 
state model where we derive the optimal strategy using dynamic 
programming. We also show via simulations that our optimal 
strategies achieve gains over the simple greedy algorithm for a 
range of primary channel parameters. 

I. Introduction 

Cognitive radio technology is a solution to the problem of 
spectrum under-utilization caused mainly by static spectrum 
allocation. In cognitive radio networks, the licensed users 
coexist with cognitive users, also known as the secondary or 
unlicensed users. The secondary users attempt to utilize the 
resources unused by the primary users adopting procedures 
that aim at protecting the primary network from service 
interruption and interference. 

There has been interest in schemes that make use of the 
feedback of the primary link to predict the behavior of 
the primary user in the future and, in the case of primary 
channel temporal correlation, to gain knowledge about the 
channel between primary transmitter and receiver (e.g., [1], 
[2] and [3]). In [1], the secondary user observes the automatic 
repeat request (ARQ) feedback from the primary receiver. 
The ARQs reflect the primary user achieved packet rate. The 
cognitive radio's objective is to maximize secondary through- 
put under the constraint of guaranteeing a certain packet rate 
for primary user. The main difference between our work pre- 
sented in this paper and [1] is that in [1] there is no use of the 
possible channel correlation across time, whereas we assume 
that the primary channel state follows a Markov chain. The 
cognitive transmitter can hence exploit the ARQs to predict 
the primary channel state during the next transmission phase. 
In [2], assuming a temporally correlated channel between 



the primary transmitter and receiver, the cognitive transmit 
power is adjusted based on primary channel state information 
(CSI) feedback. A real-time fading channel model is assumed 
rather than a binary erasure channel as we consider and 
discuss below. However, the computation of the optimal 
procedure in [2] is computationally prohibitive. 

There has been a series of recent work on cognitive MAC 
for opportunistic spectrum, e.g., [4], [5], and [6]. In [4], 
an analytical framework for opportunistic spectrum access 
is developed on the basis of Partially Observable Markov 
Decision Processes (POMDP). The framework of POMDP 
is needed given the uncertainties about the quality of the 
primary link, and about primary activity as a result of sensing 
errors. In [4] a slotted primary network was considered, 
where primary activity remains fixed over the duration of 
a slot and switches between idle and active states according 
to a two-state Markovian process. The channel between the 
primary transmitter and receiver is not considered, and the 
feedback used to predict the channel availability is provided 
by the secondary receiver. In [5], the work in [4] is expanded 
to account for energy consumption and spectrum sensing 
duration optimization. In [6], the authors focus on the ARQ 
messages used in primary data-link-control and which are 
overheard by the secondary transmitter. Exploiting the pri- 
mary feedback signals, the secondary terminal can optimize 
its access policy by assessing primary reception quality. The 
primary channel is assumed to be of fixed quality resulting 
in two fixed and known packet error rates corresponding to 
the presence and absence of secondary transmissions. 

In this paper, we consider a primary transmitter that is 
always on. It sends a packet at each time slot, which has 
a fixed duration, and receives an ACK or NACK feedback 
from its receiver. The feedback is received correctly by both 
the primary and secondary transmitters. The channel between 
the primary transmitter and receiver is modeled as a Markov 
process with a finite number of states that determine the 
probability of correct reception. In this paper, we study 
primary link models with two and three states. The state 
of the channel does not change over a slot. The channel 
may switch states at the beginning of each slot according 
to the transition probabilities of the Markov process. The 
cognitive user exploits the ACK/NACK feedback from the 
primary receiver to predict the quality of the primary link. At 
the beginning of each time slot, the secondary user decides 
whether to remain silent and listen to primary feedback, or 



to carry out transmission. The objective is to maximize the 
weighted sum throughput of both the primary and secondary 
links. 

Our contributions in this paper are as follows. For the two- 
state case, we find a closed form expression of the weighted 
sum throughput, and find the strategy that maximizes this 
throughput for any weight. Changing the weight spans the 
boundary of the primary-secondary rate region. For the three- 
state case, we model the problem as a dynamic programming 
problem, and employ Bellman's equation [7] to arrive at the 
optimal strategy. In this paper we focus on the single channel 
case, but our scheme can be readily extended to the multiple 
channel case. 

One of the advantages of our scheme is that the ARQ 
feedback can capture the temporal correlation in the channel. 
The cognitive user can access the primary channel in both 
cases, when the primary channel quality is relatively high 
(primary can transmit successfully regardless of cognitive 
user activity) and when its quality is very low (primary 
transmission fails whether secondary is active or not). This 
advantage cannot be captured in schemes employing spec- 
trum sensing only. 

The paper is organized as follows. The two-state system 
model and assumptions are described in Section [TT] where 
we find a closed form solution for the optimal throughput 
for primary and secondary networks. In Section Hill the 
three-state system model is examined. Numerical results are 
presented in Section [TV] Our work is concluded in Section 
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II. Two-state System model 
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Fig. 1. Z-interference erasure channel model. The secondary transmitter, 
when active, causes interference on the primary receiver. The secondary 
receiver, on the other hand, is shadowed from the primary transmitter, 
thereby suffering no interference from it. ENC is the channel encoder, 
whereas DEC is the receive decoder. 

Our proposed model assumes that we have one primary 
link and one secondary link. An illustration of the setup is 
provided in Figure Q] We are concerned with Z-interference 
channel model [8] where the interference from the primary 
transmitter on the secondary receiver is ignored. The Z- 
interference channel models important applications such as 
the interference caused by macro-cell users on femto-cell 
receivers, which is known in the literature as the "loud 
neighbor" problem. In our context, the primary terminals 



may be close to one another and use small transmission 
power, whereas the cognitive terminals may be far from 
one another and use high power for communication causing 
considerable interference on the primary link. 

We assume that the activity factor of the primary link is 
unity i.e., the primary transmitter sends a packet at each 
time slot. The primary link follows a two-state Markov 
chain. The primary link is either in an erasure (E) state or a 
non-erasure (N) state during each time slot. It switches states 
from one time slot to the next according to a Markovian 
process as shown in Figure [2] The process is specified by 
two parameters Pee and Pne, where Pee is the probability 
that the primary network is in erasure in the next time slot 
given that it is in erasure state in the current slot, and Pne 
is the probability that the primary network is in erasure in 
the next time slot given that it is in non-erasure state in 
the current slot. The transition probabilities of the Markov 
chain are known a priori. The transition matrix P which 
includes the transition probabilities is given by 




Pne 



Fig. 2. Two-state Markov model. 

The erasure state causes the primary transmission to fail, 
while the non-erasure state results in successful packet deliv- 
ery to the primary receiver only when there is no interference 
from the secondary transmitter. That is, if the cognitive user 
decides to transmit in the non-erasure state, its transmission 
causes the erasure of the primary packet. 

The cognitive radio can eavesdrop on the primary ARQ 
through which the secondary transmitter can detect the state 
of the primary link and, consequently, know the erasure 
probability of the next state using the transition probabilities. 
However, if the cognitive radio decides to transmit at a 
certain time slot, it causes the primary packet to be erased 
at this time slot. The secondary user then overhears a 
negative acknowledgment (NACK) from the primary receiver 
no matter what the state of the primary channel is. This 
means that when the cognitive user transmits, it becomes 
unaware of the primary link state. 

Our objective is to choose the transmission strategy that 



maximizes the weighted sum throughput Thr given by 

Thr = wR p + (1 - w)Rs 



(1) 



where R p and R s are the mean primary and secondary 
throughput, respectively, and < w < 1 is a weighting factor 
that determines the relative importance of the two rates. 
In order to protect the primary user from interference and 
service interruption, parameter w can be chosen close to one. 
The optimization problem has an exploration-exploitation 
tradeoff aspect. The tradeoff is between the cognitive user 
activity which maximizes the secondary user throughput, 
and cognitive user silence which gives the secondary user 
knowledge about the channel state information of the primary 
link through the ARQ feedback. 

The primary reward, if the primary succeeds to transmit 
one packet through the binary erasure channel, is r p . The 
secondary reward, if the secondary decide to transmit a 
packet, is r s . Note that we can take account of any possible 
packet loss in the secondary channel when we calculate 
the value of r s . The expected primary throughput at time 
slot t estimated by the secondary transmitter is given by 
r p (1 — pt), where p t is the secondary belief that the channel 
is in erasure at time t. The belief is updated from one time 
slot to another according to the following. 



other condition, the optimal secondary strategy is as follows. 
The secondary transmitter listens as long as an ACK is 
received because wr p (1 — Pne) > (l—w)r a and in that case 
maximizing the throughput in the next time slot is optimal 
since we do not affect future decisions as the secondary will 
make use of the knowledge of the next state while it is 
silent. Once a NACK is received, the secondary transmits 
M consecutive packets. Thus, the maximization problem 
is equivalent to choosing optimal secondary consecutive 
transmitted packets M that maximizes the weighted sum 
throughput. 

We can model the problem by a three state Markov chain 
as shown in Figure [3] 

1. Erasure state and the secondary is silent (E). 

2. Non-erasure state and the secondary is silent (N). 

3. Secondary sends M consecutive packets (S). 



Pt 



Pee 
Pne 
-iPbe + (1 



if current state is erasure 
if current state is non-erasure 
if current state is unknown 



Pt-l^EE + 1,1 — Pt-l)PsE 

Note that the third possibility occurs when the secondary 
transmits. The expected secondary throughput at time slot t 
is given by r s I t where I t is an indicator function given by 

if the secondary is silent at time slot t 

1 if the secondary is active at time slot t 

A. Throughput Maximizing Scheme 

We assume that Pee > Pne making the belief p t a 
monotonic function with time as long as the secondary user 
is transmitting [9]. This can be readily seen by solving the 
first order difference equation governing the evolution of pt 
to obtain 

Pt = (Pee - PNE) k Pt-k + [1 - (Pee - Pne)*] P (E) (2) 

where pt-k is the probability of being in erasure at time slot 
t — k and the secondary is transmitting for k consecutive 
slots, and P(E) is the steady state probability as mentioned 
above. It is clear that if Pee — Pne > 0, the belief p t is a 
monotonic function with time, otherwise the term (Pee — 
PNE) k oscillates between positive and negative values. 

If wr p (1 — Pee) > (1 — w)r s , the optimal secondary 
strategy is to listen always because the inequality also implies 
wr p (1 — Pne) > (1 — w)r s which means that regardless of 
the actual system state, whether it is E or N, the expected 
primary throughput is greater than the expected secondary 
throughput. Similarly, if wr p (1 — Pne) < (1 — w)r s , the 
optimal secondary strategy is to transmit always. For any 




Fig. 3. Throughput Maximizing Scheme. 

When the Markov chain is in the (JV) state, the primary 
achieves a throughput of r p . When it is in the (S) state, 
the secondary achieves a throughput of Mr s as the system 
remains in this state for M time slots. 

In order to find an expression of the throughput as a 
function of M, we find the stationary distribution of each 
state of the Markov chain. Let the steady state probability of 
(AO, (E) and (S) states be P^ s , P| s and Pg s respectively, 
then 

1-P m (Pee) 



ps 



1 



Pe=P§ s = 



2P ne - P m (Pee) 
Pne 

iM ( 



(3) 



(4) 



1 + 2P ne -T m (P E e) 

where T m (P E e) is the probability of erasure at time slot t 
given that the state of Markov chain at time slot t — M was 
erasure. We can find T m (Pee) from the two-state Markov 
chain: 



T M (P] 



EE J 



Pne + (Pee - Pne) (m+1) (1 - Pee) 



(5) 



1 + P NE - Pee 
Recall that we assume a positively correlated channel with 
Pee > Pne- 



A closed form expression for the primary throughput R p 
and the secondary throughput R s can be written as: 
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(6) 



(7) 



We want to find M that maximizes Thr(M) = wR p + (1 — 
w)R s . We can notice that the optimal value of A/ depends 
on the weight w. This scheme spans a number of points 
on the outer bound of the capacity region with the optimal 
values of M (integer numbers) that maximize the weighted 
sum throughput. The outer bound of the capacity region here 
is piecewise linear, and can be achieved by time division 
multiplexing between the different values of M. 
Two remarks are in order here: 

1. Using the properties of the function V K (p) defined later 
in Equation (1141) . it can be shown that the optimal strategy 
is a threshold-based policy on the belief p t , and since the 
belief p t is monotonic with M, finding the threshold amounts 
to finding the value of M that maximizes the throughput 
expression. 

2. It can be shown that the throughput is a quasi-concave 
function of M, and through some algebraic manipulations, 
one can arrive at the value of M that maximizes the 
throughput. This can be shown by subtracting Thr(M) from 
Thr(M + 1). Treating M as a continuous variable, we can 
show that this difference has only one positive finite root that 
is greater than or equal to unity. By finding this root, we find 
the value of the unique optimal M. 

We will present the details of these proofs in an extended 
version of this work. 

B. Dynamic Programming 

For the three-state channel model presented later and for 
the case of multiple channels, we may not be able to find a 
closed-form expression for the throughput. In those cases, we 
propose to use dynamic programming techniques to arrive 
at the optimal strategy. In this section we will present the 
dynamic programming approach to the two-state case. 

If we assume an infinite horizon optimization, and through 
a dynamic programming argument, the state of the system 
can be fully parameterized by the belief that the channel 
is in erasure the next time slot, p, where we dropped the 
time dependence. Hence the action taken by the cognitive 
user depends only on p. The belief state p can be updated 
according to one of the following three cases depending on 
the action taken by the secondary user and the corresponding 
outcome. We follow here the notation presented in [5]. 

Case 1: Secondary user is silent and a positive acknowl- 
edgment (ACK) is received from the primary network. The 
ACK implies that primary network has been in the non- 
erasure state, and primary receiver has succeeded in decoding 
the packet. Therefore, the secondary belief that the channel 



would be in erasure during the next time slot is 



Li(p)=P: 



NE 



(8) 



where (p) is the update expression for p for the fcth case. 
Each case is a certain combination of secondary decision and 
observation. 

Case 2: Secondary user is silent and a negative acknowl- 
edgment (NACK) is received from the primary network. This 
implies that primary network has been in erasure state and 
the sent packet has not been delivered successfully. Thus, 



L^(p) = Pee 



(9) 



Case 3: Secondary user transmits. The probability of erasure 
is updated by the Markovian property as follows 



L 3 (p) = P Pee + (I - p) Pi 



NE 



(10) 



We denote the weighted expected instantaneous throughput 
when the secondary user listens by G\ (p) which is given by 



d (p) = wr p (1 - p) 



(11) 



For the case of the secondary transmitting, we denote the 
weighted expected throughput by 



G 2 0) = (1 - w) r s 



(12) 



A greedy scheme would just compare G\(p) with G2(p) 
and if G^ip) > G\(p), the secondary user decides to trans- 
mit, otherwise it remains silent. The expected instantaneous 
reward is: 



R(p,t) 



Gi(p) if the secondary is silent at time t 
G*2(p) if the secondary is active at time t 



The optimal strategy, on the other hand, takes into account 
the expected future reward. The optimal strategy is the 
strategy that maximizes the following discounted reward 
function [5] 



'K+t-l 



El an ^R(Pn,n) | Pt 



Pi 



(13) 



where < a < 1 is a discounting factor and 1 < K < oo is 
the control horizon. As a decreases, the secondary user puts 
more emphasis on its short-term future gains. 

Following the definitions in [5], let V K (p) denote the 
maximum achievable discounted reward function. When 
K < oo, V K (p) satisfies the following Bellman equation 
[7]: 



V K {p) = max {wr p {l - p) + a(l - p)V R - 1 (la (p)) 
apV K - x (L 2 (p)) , (1 - w)r s + aV K -\L 3 (p))} 



where 



V 1 {p) = max{w)r p (l —p), (1 - w)r s } 



(14) 



(15) 



When K = oo, V K (p) = V K ~ 1 (p) = V(p) which satisfies 
the following Bellman equation: 



V(p) 



: {w p (l - p) + a(l - p)F (Li (p)) + 



(16) 



apy (L 2 (p)) , (1 - W )r s + aV(L 3 (p))} 

We solve Equation (TBI iteratively via approximating the 
value function at a finite number of belief values on a 
grid (see, for instance, [7] and [10]). The value function is 
initialized and then (fl4t is used to update it. For p values not 
belonging to the grid, interpolation or extrapolation is used. 
After convergence, the secondary terminal decides whether 
to transmit or listen based on the term that maximizes V(p) 
at each value of p. 

III. Three-state system model 

In this section, we extend our previous channel model to 
a three-state model where the primary channel now follows 
a three-state Markov chain whose states are named Bad (B), 
Good (G) and Very good (Vg) with transition matrix P where 



P = 



PbB PbG -PBVg 

Pgb Pgg PGVg 

Pv s B PygG PvgVg 



If the secondary is listening, the primary user can deliver its 
packet if the channel state is G or Vg. But if the secondary 
is transmitting, the primary transmission success is only in 
the Vg state. This means that the primary and secondary can 
both simultaneously transmit successfully in the Vg state. 

We can also apply dynamic programming on that system 
with three channel states to arrive at the optimal decisions 
for the secondary, whether to transmit or to listen, at any 
situation to maximize the weighted sum of the primary and 
secondary throughput. 

Here we parameterize the belief state by two parameters p 
and q, where p is the probability that the primary network is 
in the G state in the next time slot and q is the probability that 
the primary network is in the Vg state in the next time slot. 
This implies that the probability that the primary network is 
in the B state is (1—p — q). After each time slot, depending 
on the action taken by secondary user and the corresponding 
feedback, p and q can be updated according to one of the 
following four cases. 

Case 1: Secondary user is silent and a NACK is received 
from the primary network. The NACK during secondary 
silence implies that primary network has been in B state and, 
thus, the primary receiver has failed to receive the packet. 
Therefore, the belief state in the next time slot is: 



Li( P ) = Pbg 
L i(q) = Psvg 



(17) 
(18) 



where, as in the two-state case, Lk(p) and Lk(q) are the 
update expressions for p and q, respectively, for the fcth case. 
Case 2: Secondary user is silent and an ACK is received from 
the primary network. Primary network could be in G state 



with probability or Vg state with probability The 
belief state in the next time slot is: 



l *(p) = z^tzPgg + r^-r p vg G 



p + q 



P 



p + q 



Pt 



GVg 



p + q 

q 

p + q 



(19) 

PvgVg (20) 



Case 3: Secondary user is transmitting and an ACK is re- 
ceived from the primary network. The ACK during secondary 
activity implies that primary network has been in Vg state. 
Therefore, the belief state in the next time slot is: 



L 3 (p) = PvgG 
L 3 (q) = PvgVg 



(21) 
(22) 



Case 4: Secondary user is transmitting and a NACK is 
received from the primary network. Primary network could 
be in G state with probability or B state with probability 

1 7 p ^ 9 . The belief state in the next time slot is: 

1-9 

LA(P) = T^GG + 1 ~ P ~ q PBG 



U{q) 



1-9 



-Pi 



1-9 



GVg 



1-9 

i -p- q 
1-9 



(23) 
Pbv s (24) 



Let Qi(p,q), i = 1,2,3,4, denotes the probability that case 
i above happens: 



h(p,q) = i - p- 9 
Q2(p,q) =p + q 
Qz{p,q) = 9 

4 (p,9) = 1-9 



(25) 
(26) 
(27) 
(28) 



The parameters p and q characterizing the belief state are 
updated by one of the previous four conditions. If secondary 
user is listening, the expected current gain can be calculated 

as: 

Gi(p,g) = wr v (p + q) (29) 

But if the secondary user is transmitting, the expected current 
gain is: 

G 2 {p,q) = (1 - w)r s +wr p q (30) 

The expected current reward is: 

, . _ J G\ (p, q) if the secondary is silent at time t 
\PiQi ) S Cr 2 (p, g) if the secondary is active at time t 

The optimal strategy is the strategy that maximizes the 

following discounted reward function 



'K-l 



E\ a " * R (Pn, 9n, t n ) \ Po = P , 



(31) 



n=0 



V (p,q) satisfies the following Bellman equation [7]: 

V K (p, q) = max I wr p (p + q) + a ^ Q,(p, q) 

[ i=l,2 

V K - 1 (L t (p),L t (q)), (I - w)r s + wr p q+ (32) 

J2 Q t (p,q)v K - 1 (L l ( P ),L l (q)) 



i=3,4 



where 

V 1 (p,q) = max{wr p (p + q), (1 - w)r s + wr p q} (33) 

When K = oo, V(p, q) denote the maximum achievable 
discounted reward function. V(p, q) satisfies the following 
Bellman equation [7]: 

V(p,q) = max< wr p (p + q) + a ^ Qi(p,q)V(Li(p),Li(q)), 

[ i=l,2 

(1 - w)r s + wr p q + a ^ Qi(p,q)V(Li(p),Li(q)) 



i=3,4 

IV. Simulation Results 

A. Two-State model 



(34) 



- Optimal strategy 

- Greedy scheme 




Fig. 5. Optimal number of consecutive transmitted packets. 
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Fig. 4. Two-state weighted sum throughput. 

For the obtained simulation results, the system parameters 
are as follows: Pee = 0.99, Pne = 0.01, r p = 1 and r s = 1. 
The weighted sum of the primary and secondary throughput 
is shown in Figure [4] Figure [5] shows the optimal values 
of number of secondary consecutive transmitted packets M 
versus different values of the weighting factor w. We can 
see from Figure [5] that in the greedy scheme, the secondary 
transmitter transmits always (M is infinite) as long as 
w < 0.67 which explains the sudden change in the overall 
throughput as w = 0.67 in Figure [4] The optimal strategy 
has this threshold at w = 0.5 which means that the optimal 
strategy benefits more from learning the channel state rather 
than transmitting to maximize its future reward. 



Fig. 6. primary-secondary rate region. 

Our proposed scheme spans the boundary of the primary- 
secondary rate region at number of points where M has an 
integer value. The piecewise linear connection between these 
points can be achieved by time division multiplexing between 
different values of integer M, For system parameters r p = 
l,r s = 1 with the same transition probabilities, the rate 
region is shown in Figure [6] 

B. Three-State model 

The system parameters are as follows:PBB = Pgg = 
Pvgvg = 0.9, P BG = Pgb = P Vg G = 0.05, r v = 1 and 
r s = 1. The weighted sum of the primary and secondary 
throughput is shown in Figure [7] 

V. Conclusion 

In this paper, the ACK7NACK feedback from the primary 
receiver is exploited by the secondary transmitter in order 
to find optimal transmission strategies that maximize the 
weighted sum of primary and secondary throughput. For 
the two-state system model, we have derived a closed- 
form expression of the optimal overall throughput. We have 
extended the problem to the case of three channel states and 
used dynamic programming to obtain the optimal secondary 
policy. Our future work includes the study of multiple 
primary channels under the assumption of various sensing 
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Fig. 7. Three-state weighted sum throughput. 

and access secondary capabilities, as well as the study of 
multiple secondary users collaborating and competing for 
transmission opportunities. 
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3.5 



Optimal strategy 




weight (w) 



P2=p1Pee+ 
(1-P1:)Pne 




